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ABSTRACT 

Most  previous  studies  of  the  speedup  of  parallel  branch-and-bound  algorithms  are 
based  on  the  amount  of  work  done  in  the  parallel  case  and  in  the  sequential  case14’17,18,23. 
Any  evaluation  of  a  parallel  algorithm  should  include  both  the  execution  time  and  the 
synchronization  delay30.  In  this  paper,  a  finite  population  queueing  model  is  used  to  cap¬ 
ture  the  synchronization  delay  in  parallel  branch-and-bound  algorithms  and  to  quantita¬ 
tively  predict  the  behavior  of  their  speedup.  A  program  to  solve  the  Traveling  Salesman 
Problem  was  written  on  a  BBN  Butterfly2  multiprocessor  to  empirically  demonstrate  the 
credibility  of  this  theoretical  analysis.  Finally,  we  note  that  similar  analyses  can  be  ap¬ 
plied  to  evaluate  parallel  AI  systems  in  which  processes  communicate  through  a  shared 
global  database. 
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1.  Introduction 


Branch-and-bound  is  one  of  the  most  general  methods  of  solving  combina¬ 
torial  optimization  problems.  It  has  wide  applications  in  the  fields  of  operations 
research15  and  artificial  intelligence21.  In  general,  the  branch-and- bound  method 
repeatedly  partitions  the  solution  space  into  smaller  and  smaller  subspaces,  and  a 
lower  bound  (assuming  minimization  is  to  be  achieved)  of  the  cost  of  each  sub¬ 
space  is  estimated.  A  subspace  will  no  longer  be  partioned  when  its  lower  bound 
exceeds  the  known  cost  upper  bound  of  the  solution  space.  The  first  found  solu¬ 
tion  whose  cost  does  not  exceed  any  lower  bound  of  the  partioned  solution  sub¬ 
spaces  is  an  optimal  solution.  For  a  review  and  formal  definition  of  the  branch- 
and-bound  method,  see  references10,28. 

Recently  there  has  been  wide  interest  in  parallelizing  branch-and-bound  or 
combinatorial  search  methods  on  multicomputers.  Imai  et  al.11  proposed  a  paral¬ 
lel  searching  scheme  for  multiprocessors.  Wab  et  al.29  described  a  multicomputer 
architecture  for  solving  combinatorial  search  problems.  The  behavior  of  parallel 
branch-and-bound  algorithms  has  been  studied  by  several  researchers141719-23. 
Three  type  of  anomalies  for  the  speedup  of  parallel  branch-and-bound  algorithms 
were  recognized.  The  speedup  obtained  when  k  processors  are  used  can  be  (a) 
greater  than  k  ( acceleration  anomalies),  (b)  between  one  and  k  ( deceleration 
anomalies)  or  (c)  less  than  one  ( detrimental  anomalies).  However  acceleration 
and  detrimental  anomalies  are  unlikely  to  occur  in  parallel  best-first  branch-and- 
bound  algorithms  unless  there  are  a  large  number  of  subproblems  having  the 
same  lower  bound;  in  addition,  a  nearly  linear  speedup  can  be  achieved  for  a 
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large  number  of  processors  when  the  problem  size  is  large17,18,23. 

One  of  the  drawback  of  previous  analyses  of  parallel  branch-and-bound  algo¬ 
rithms  is  that  the  speedup  is  based  on  the  amount  of  work  done  in  the  parallel 
case  and  in  the  sequential  case.  Any  evaluation  of  a  parallel  algorithm  should 
include  both  the  execution  time  as  well  as  the  the  synchronization  delay30.  In 
this  paper,  using  a  finite  population  queuing  model,  we  quantitatively  predict  the 
behavior  of  the  speedup  of  parallel  branch-and-bound  algorithms  which  are  paral¬ 
lelized  on  shared  memory  multiprocessors.  A  program  to  solve  the  Traveling 
Salesman  Problem  was  written  on  a  BBN  Butterfly2  multiprocessor  to  empirically 
demonstrate  the  credibility  of  this  analysis. 

In  Section  2,  a  finite  population  queueing  model  is  reviewed.  Section  3 
describes  a  general  approach,  that  serves  as  the  basis  for  this  paper,  to  paralleliz¬ 
ing  branch-and-bound  algorithms  on  multiprocessors.  Section  4  describes  how 
the  speedup  behavior  of  parallel  branch-and-bound  algorithms  can  be  analyzed 
using  the  finite  population  single  server  queueing  model.  Section  5  presents  simu¬ 
lation  results  on  a  BBN  Butterfly  machine.  Finally,  a  conclusion  which  includes 
some  suggestions  about  avoiding  early  saturation  in  speedup  is  provided. 

2.  Finite  Population  Queueing  Model 

We  now  review  a  queueing  model  which  has  been  effectively  used  to  predict 
the  performance  of  interactive  time-sharing  computer  systems26-  In  Figure  2.1  we 
have  a  closed  network  consisting  of  a  single  central  server  and  a  finite  number  of 
“sources".  This  queueing  model  operates  in  the  following  way:  When  a  source 
makes  a  request  at  the  central  server,  the  source  “goes  to  sleep".  The  request, 
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possibly  after  a  queueing  delay,  then  receives  service  at  the  central  server.  When 
the  request  is  finally  served,  the  response  is  fed  back  to  the  source.  At  this  point, 
the  source  “wakes  up”  and  starts  generating  a  new  request.  The  time  spent  by 
the  source  in  generating  a  new  request  is  referred  to  as  its  thinking  time. 

We  assume  that  each  source  has  an  average  thinking  time  of  X-1  sec;  or 
equivalently,  each  source,  when  thinking,  generates  requests  at  an  average  rate  of 
X.  The  central  server  has,  for  each  request,  an  average  processing  time  of  p_1  or 
an  average  processing  rate  of  (A.  We  are  interested  in  the  response  time  of  the 
central  server  as  seen  by  a  source. 

The  average  response  time  W  can  be  solved  by  equating  the  arrival  rate  of 
requests  to  the  central  server  to  the  departure  rate  from  the  central  server,  or  by 
Little’s  Formula13: 


where  n  is  the  number  of  sources  and  p0  is  the  probability  that  there  is  no  out¬ 


standing  request  at  the  central  server.  Define  p  =  — .  Equation  (2.1)  then  can 
be  rewritten  as 


Since  0  <  1  -  p0  <  1, 


W  =  J-(—2- 

l  1  -  Po 


(2.2) 


Vr>i(n-i)  (2.3) 

When  the  number  of  sources  is  large,  p0  can  become  very  small.  Therefore  Equa¬ 
tion  (2.3)  can  be  considered  as  an  asymptotic  approximation  for  W . 


Note  that  the  distributions  of  the  thinking  time  of  the  sources  and  the  pro¬ 
cessing  time  at  the  central  server  do  not  play  a  role  in  the  derivation  of  Equation 
(2.2).  However,  the  above  informal  mean  flow  analysis  is  strictly  true  only  for  cer¬ 
tain  types  of  service  and  thinking  time  distributions,  namely  an  exponential  dis¬ 
tribution  or  rational  distribution  with  round-robin  scheduling13.  Nevertheless,  this 
model  has  been  found  to  be  surprisingly  robust  in  evaluating  systems  which 
violate  most  of  the  strict  assumptions26. 

If  we  assume  that  the  thinking  time  and  the  processing  time  are  both 
exponentially  distributed,  then  the  probability  p0  can  be  determined  analytically 
and  is  given  by12 

'•-[&*rVr  (24> 

In  Figure  2.2,  W  as  a  function  of  n  with  p~x  having  a  value  of  40  is  plotted. 
Kleinrock13  defined  the  saturation  number ,  which  we  denote  by  n  *,  as 

n  '  =  1  +  —  (2.5) 

P 

Indeed,  n*  is  the  maximum  number  of  sources  for  which  requests  can  be 
scheduled  without  interference. 

3.  Parallel  Best-First  Branch-and-Bound  Algorithms  with  a  Global 
OPEN  list 

In  this  paper,  we  assume  that  the  best-first  branch-and- bound  algorithms  are 
parallelized  on  a  tightly-coupled  multiprocessor.  A  global  OPEN  list,  which  con¬ 
tains  the  nodes  that  have  been  generated  but  not  yet  examined  in  the  search  tree, 
is  shared  by  all  the  processors.  Each  node  in  the  OPEN  list  is  a  representation  of 


the  root  of  a  subproblem.  Initially  the  OPEN  list  contains  only  a  representation 
of  the  entire  solution  space.  Each  processor  then  repeatedly  removes  the  best  cost 
node  from  the  OPEN  list,  solves  the  node  if  it  is  a  solution  or  decomposes  the 
node  into  one  or  more  child  nodes,  estimates  their  cost  (and  possibly  eliminates 
some  of  them  by  applying  the  dominance  relation),  and  then  inserts  them  into 
the  global  OPEN  list  in  the  appropriate  positions.  The  basic  task  of  each  proces¬ 
sor  is  regular  iterations  of  deletion-decomposition  or  decomposition-insertion.  We 
will  refer  to  an  entire  operation  of  deletion-decomposition  or  decomposition- 
insertion  as  one  iteration . 

The  time  spent  to  insert/delete  a  node  into/from  the  OPEN  list  depends  on 
the  length  of  the  OPEN  list  and  the  discipline  used.  The  length  of  the  OPEN 
list  can  become  very  large  and  hence  the  time  taken  to  delete  from  and  insert 
into  the  OPEN  list  will  not  be  negligible  when  compared  to  the  time  spent  in 
decomposition.  A  processor  trying  to  insert/delete  a  node  into/from  the  OPEN 
list  cannot  proceed  when  any  other  processor  is  holding  the  OPEN  list.  This 
introduces  a  synchronization  delay  which  is  ubiquitous  in  realization  of  parallel 
algorithms.  Multiprocessor  systems  usually  provide  some  locking  mechanisms, 
e.g.  spin  lock  and  semaphores 5,  for  users  to  implement  mutually  exclusive  codes. 
Spin  lock  or  busy-waiting  is  efficient  for  infrequent  and  short  locking.  However, 
for  frequent  lockings  which  take  nonnegligible  time,  spin  lock  can  result  in  serious 
lose  of  memory  bandwidth  and  network  bandwidth.  Moreover  if  the  accessing 
order  cannot  be  preserved,  programs  may  execute  for  arbitrary  times  since  the 
performance  of  heuristic  search  strongly  depends  on  the  order  of  node  examina¬ 
tion.  On  the  other  hand,  semaphores  are  recommended  for  synchronization  of  the 


OPEN  list,  in  spite  of  the  possible  inefficiency  due  to  frequent  context  switching. 

The  termination  of  parallel  branch-and-bound  algorithms  must  be  handled 
carefully.  To  ensure  correctness,  a  current  best  solution  (sometimes  referred  to  as 
the  incumbent  in  the  literature)  is  kept  in  a  globally  shared  memory.  The  paral¬ 
lel  branch-and-bound  algorithm  is  terminated  when  either  one  of  the  following 
conditions  is  satisfied:  (a)  There  are  no  active  processes  and  the  OPEN  list  is 
empty,  (b)  There  are  no  active  processes  and  no  nodes  in  the  OPEN  list  have 
better  cost  than  the  current  best  solution.  In  either  case  the  current  best  solution 
(if  any)  is  the  global  solution.  Note  that  the  order  of  the  evaluation  of  the  argu¬ 
ments  to  the  Boolean  operator  and  above  is  also  important.  Here  we  assume  the 
arguments  of  and  are  evaluated  from  left  to  right.  In  condition  (a),  if  the 
evaluation  order  of  and  is  from  right  to  left  then  it  is  possible  that  the  OPEN 
list  is  tested  true  before  some  active  process  inserts  nodes  into  the  OPEN  list  and 
terminates.  Similar  arguments  hold  for  condition  (b). 

The  parallelizing  scheme  we  assume  here  was  first  proposed  by  Imei  et  alu. 
Similar  methods  were  later  used  by  others20,24  in  implementing  best-first  branch- 
and-bound  algorithms  on  shared  memory  multiprocessors. 

4.  A  Tight  Upper  Bound  for  the  Speedup  of  Parallel  Best-First 
Branch-and-Bound  Algorithms 

In  this  section,  we  show  how  the  finite  population  queueing  model  can  be 
used  to  predict  the  speedup  of  parallel  best-first  branch-and-bound  algorithms. 

Recall  that  in  the  above  parallel  branch-and-bound  algorithm,  each  processor 
repeatedly  decomposes  a  node  and  performs  OPEN  insertion  or  deletion.  The 


OPEN  list  can  at  most  be  accessed  by  one  processor;  the  other  conflicting  proces¬ 
sors  must  be  blocked.  It  is  not  difficult  to  recognize  the  analogy  between  the 
parallel  branch-and-bound  algorithm  and  the  finite  population  queueing  model 
discussed  in  Section  2.  The  processors  are  the  finite  sources  of  this  queueing 
model.  Each  processor  spends  some  time  (thinking  time)  decomposing  a  node 
and  then  sends  a  request  (insertion/deletion)  to  the  OPEN  list.  A  FCFS  queue 
can  be  used  to  handle  the  requests  at  the  OPEN  list.  The  time  spent  in 
insertion/deletion  can  be  considered  as  the  central  server’s  processing  time.  Note 
that  there  is  no  processor  dedicated  to  be  the  central  server. 

Let  us  again  assume  that  the  average  time  spent  for  a  processor  to  decom¬ 
pose  a  node  is  X-1  sec.  and  the  time  taken  to  insert/delete  a  node  into/from  the 
OPEN  list,  when  it  is  free,  has  an  average  of  fi~l  sec.  The  node  decomposition 
time  will  not  be  affected  by  other  processes  running  concurrently,  while  the 
insertion/deletion  time  could  be  delayed  by  other  processes  trying  to  access  the 
OPEN  list  at  the  same  time.  Therefore,  the  average  time  spent  during  an  itera¬ 
tion  (decomposition-insertion/decomposition-deletion)  is  X~l  +  n'1  in  the  sequen¬ 
tial  case  and  X-1  +  W  in  the  parallel  case,  where  W  is  the  average  response  time 
predicted  by  Equation  (2.1). 

Define  Ix  as  the  total  number  of  iterations  executed  in  a  best-first  branch- 
and-bound  algorithm  when  a  single  processor  is  used  and  /„  the  total  number  of 
iterations  executed  when  n  processors  are  used.  If  detrimental  or  acceleration 
anomalies  do  not  exist,  /„  is  at  least  as  large  as  Ix  and  they  are  very  close  when 
the  problem  size  is  large17,18,23. 


The  speedup  5(n)  for  n  processors  is  the  ratio  between  the  execution  times 


when  using  one  processor  and  using  n  processors.  Hence, 


S(n)  = 


4  X  (X-1  +  At'1) 
4  X  (X-1  +  W(n)) 


<  n  X 


n 


(4.1) 


X-1  +  W(n) 

where  W(n)  is  the  average  response  time  of  the  requests  at  the  OPEN  list  when 


n  processors  are  used.  Using  Equation  (2.3)  as  an  asymptotic  approximation  for 


W  in  Equation  (4.1)  and  letting  p  =  — ,  we  obtain 

1  +  p 

1  +  («  ~  —  )p 
fi 

or 

S(n)  <  1  +  —  (4.2) 

P 

In  Figure  4.1,  speedup  5  as  a  function  of  n  is  plotted  with  a  p~l  value  of  40. 
Note  that  the  speedup  saturates  very  quickly  when  the  number  of  processors 
exceeds  the  saturation  number  defined  in  Equation  (2.5)  though  we  have  a  nearly 
linear  speedup  before  that  many  processors  are  used.  Here,  the  saturation 
number  =  41. 


Equation  (4.1)  is  obtained  with  the  following  assumptions: 


a)  Equal  work  between  sequential  algorithm  and  parallel  algorithm,  i.e. 

4  =  4- 

b)  Decomposable  computation  is  fully  parallelized  among  available  processors. 


c)  No  process  scheduling  overhead. 

d)  No  overhead  cf  locking  primitives. 

e)  No  hardware  level  contention,  e.g.  memory  contention  or  switch  contention. 

These  assumptions  will  not  be  true  in  practice  but  can  be  achieved  quite  closely 
when  not  many  processors  are  used.  Hence  the  result  of  Equation  (4.1)  can  be 
considered  as  a  tight  upper  bound. 

5.  A  Simulation  Result 

To  empirically  demonstrate  this  analysis,  a  parallel  branch-and-bound  algo¬ 
rithm  was  implemented  on  the  BBN  Butterfly2  using  Butterfly  Lisp27  to  solve  the 
Traveling  Salesman  Problem.  Given  a  set  of  cities  and  the  distances  between  each 
pair  of  cities,  the  TSP  is  to  find  a  complete  shortest  tour  which  visits  every  city 
once  and  only  once.  Mohan20  solved  the  TSP  on  the  Cm*  multiprocessor. 
Although  the  speedup  he  obtained  was  less  than  8  when  16  processors  were  used, 
he  estimated  that  an  almost  linear  speedup  could  be  achieved  for  up  to  12  proces¬ 
sors  if  a  hardware  contention  problem  (referred  to  as  cluster  contention  in  Cm*) 
could  be  factored  out.  Rao  et  al.24  recently  reported  an  encouraging  result  in 
which  they  obtained  a  speedup  of  7  on  8  processors  when  solving  the  TSP  on  a 
Sequent  Balance  8000.  The  parallelizing  strategy  that  Mohan  and  Rao  et  al.  used 
was  basically  the  same  as  that  illustrated  in  Section  2. 

We  did  not  use  Little  et  al.’s  algorithm19  as  both  Mohan  and  Rao  et  al.  did. 
We  used  an  algorithm  based  on  the  assignment  problem22.  Let  ci}-  be  the  dis¬ 
tance  between  city  i  and  city  j .  Then  the  m-city  TSP  can  be  stated  as: 


minimize 


j =m i =m 

E  E  caxa 

;=i  1=1 

subject  to 

z,y  =  0  or  1,  1  <  *  <  m,  1  <  j  <  m  (5.1) 

E  xij  =  1  <  j  <  m  (5.2) 

i=i 

'  *'  J]  z,y  =  1,  1  <  »  <  m  (5.3) 

>=i 

If  Zy  =  1,  the  route  from  city  i  to  city  j  is  chosen  in  the  tour.  Constraints  (5.2) 
and  (5.3)  state  that  there  can  be  only  one  chosen  route  which  departs  from  or 
comes  into  each  city.  However,  constraints  (5.1),  (5.2)  and  (5.3)  do  not  fully 
characterize  the  solution  of  the  TSP  yet,  since  the  solution  so  obtained  may  con¬ 
tain  several  disjoint  cycles.  Another  constraint  must  be  added  to  exclude  those 
solutions  which  contain  disjoint  cycles. 


Figure  5.1  shows  an  example  of  a  4-city  TSP  with  cost  matrix 


[oo  20  30  10*1 
5  oo  7  14 
8  12  oo  3 
15  6  11  ooj 

The  algorithm  starts  by  considering  the  alternative  routes  departing  from  city  1, 


then  city  2,  and  so  forth.  The  state  of  a  node  in  the  search  tree  is  represented  as 


a  tuple  (zlli,z2,9,...,Zitlt).  The  cost  of  the  node  is  the  sum  of  g  and  h,  where 


g  =  clf*  +  c2,a  +  •  ■  •  +  ckitt  and  h  =  the  sum  of  the  minimum  of  each 


column  of  the  ( m-k)X{m-k )  matrix  obtained  by  deleting  rows  1  through  k  and 


columns  through  ik  of  the  cost  matrix. 


The  OPEN  list  is  organized  as  a  heap 9  in  our  implementation.  Figure  5.2 
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shows  the  speedup  curve  we  obtained  to  solve  a  12-city  TSP.  The  speedup  is 
almost  linear  up  to  8  processors  but  only  reaches  a  peak  of  12  when  20  processors 
are  used.  In  general,  the  number  of  iterations  executed  is  slightly  higher  when 
more  processors  are  used.  In  Figure  5.3  the  speedup  curve  is  adjusted  by  the 
number  of  iterations.  That  is,  if  5(n)  is  the  measured  speedup  for  n  processors 
then  the  adjusted  speedup  S(n)*  is  given  by 

S{n  )*  —  S(n )  X  i- 

The  decomposition  time  and  the  OPEN  insertion/deletion  time  were  sampled 
during  execution  when  8  processors  were  used.  Figure  5.4  and  Figure  5.5  show 
the  histograms  of  the  OPEN  insertion/deletion  time  and  the  decomposition  time 
distributions*.  According  to  the  mean  decomposition  time  and  mean 
insertion/deletion  time,  the  p~l  value  is  14.6.  The  speedup  should  saturate  at  15.6 
by  our  analysis.  However,  readers  should  not  take  the  figures  we  obtained  too 
seriously.  Some  of  the  decomposition  time  measured  actually  includes  the  time 
spent  by  the  incremental  garbage  collector4  in  the  Blisp  system*  >  and  hence  the 
mean  decomposition  time  measured  will  be  slightly  higher  than  it  should  be.  In 
addition,  the  semaphores  we  used^  contain  an  internal  critical  region  which  also 
introduces  non-negligible  synchronization  delay. 

Note  that  higher  speedup  can  be  expected  when  solving  a  TSP  with  larger 
number  of  cities.  This  is  because  the  decomposition  time  is  proportional  to  the 

t  The  BBN  Blisp  we  used  was  a  beta  release  from  BBN.  No  compiler  was  available 
when  this  paper  was  prepared.  The  interpreter  was  very  slow  in  absolute  terms. 

f  This  can  be  seen  from  the  extremely  high  time  occurrences  in  Figure  5.5. 

§  The  semaphores  we  used  was  implemented  using  futures  in  Blisp.  Futures  are  very 
expensive  in  the  current  Blisp  version. 


problem  size  and  larger  mean  decomposition  time  will  increase  the  value  of  p  l. 

8.  Discussion  and  Conclusion 

We  have  demonstrated  that  the  finite  population  queueing  model  can  be 
used  to  accurately  analyze  the  speedup  of  parallel  branch-and-bound  algorithms. 
The  accuracy  is  obtained  because  the  synchronization  delay  is  taken  into  con¬ 
sideration.  Certainly,  more  accurate  analysis  should  include  architectural 
inefficiency,  e.g.  memory  contention,  bus  contention,  switch  contention,  etc.,  and 
system  overhead,  e.g.  scheduling  overhead,  overhead  of  synchronization  primi¬ 
tives,  etc.  However,  this  kind  of  analysis  is  machine-dependent  and  its  long-term 
significance  on  the  design  of  parallel  algorithms  is  unclear.  There  is  hope  that 
new  technology  will  reduce  the  architectural  inefficiency  and  system  overhead  to 
an  insignificant  amount25. 

The  results  we  obtained  from  theoretical  analysis  and  simulation  show  that 
because  of  the  synchronization  delay,  the  speedup  of  parallel  algorithms  can 
saturate  very  quickly  after  a  nearly  linear  speedup  up  to  a  certain  number  of  pro¬ 
cessors.  This  apparently  pessimistic  result  can  be  used  to  design  more  efficient 
parallel  algorithms.  The  saturation  number  defined  in  Equation  (2.5)  can  be  con¬ 
sidered  as  the  maximum  number  of  processors  to  be  used  before  speedup 
saturates.  Beyond  that  parallelism  does  not  help.  Extending  the  saturation 
point  is  equivalent  to  enlarging  the  value  of  p~l.  This  can  be  achieved  by  increas¬ 
ing  X'1  or  decreasing  n~x.  We  could  raise  X'1  by  magnifying  the  granularity  of 
the  node  decomposition.  One  possible  way  is  to  execute  more  iterations  before  a 
process  communicates  with  the  global  OPEN  list.  For  heuristic  search,  this  must 


be  done  carefully  since  larger  granularity  means  less  heuristic-guidance.  can 
be  lowered  by  better  insertion  and  deletion  disciplines  for  the  global  OPEN  list. 
For  example,  heap  insertion  and  deletion  is  superior  to  linear  insertion  and  dele¬ 
tion. 

Other  optimizing  methods  are  also  possible.  The  OPEN  list  can  be  parti¬ 
tioned  into  several  disjoint  regions  according  to  different  ranges  of  cost  estimation 
values.  An  extreme  case  is  that  each  region  corresponds  to  a  single  cost  estima¬ 
tion  value  so  that  insertion  and  deletion  can  be  done  in  0(1)  time.  This  method 
was  actually  used  by  Rao  et  al.24  to  solve  a  15-puzzle  problem.  They  took 
advantage  of  the  fact  that  the  cost  estimation  function  has  a  small  range  and  can 
be  predetermined.  Since  now  each  region  is  locked  separately  the  OPEN  list 
bottleneck  could  be  potentially  relieved  by  a  degree  of  a  multiple.  The  perfor¬ 
mance  of  the  algorithms  based  on  a  multiple-region  OPEN  list  can  be  formally 
analyzed  by  a  more  general  finite  source  multiple  server  queueing  model. 

If  the  range  of  the  the  cost  estimation  function  is  not  small  or  cannot  be 
predetermined,  a  possible  way  to  alleviate  the  bottleneck  is  to  organize  the  global 
OPEN  list  as  a  concurrent  R-tree116  so  that  insertions  and  deletions  performed 
in  different  subtrees  can  be  possibly  executed  in  parallel. 

Finally  we  note  that  a  similar  analysis  can  be  applied  to  parallel  AI  systems 
whose  main  communication  medium  is  through  a  global  shared  database,  e.g.  a 
blackboard 6. 
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Figure  2.2  Average  response  time  curve  obtained  from  Equations  (2. 1)  and 
(2.3)  with  1/p  *  40. 
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Figure  5.3  Adjusted  speedup  curve  from  Figure  5.2  by  the  number  of  iterations. 


